pacman::p_load(tidyverse, reshape2, ggridges, ggdist,
ggrepel, ggthemes, hrbrthemes, patchwork)Take-home Exercise 1 Part I - Creating Data Visualisation Beyond Default

Context
There are two major residential property market in Singapore, namely public and private housing. Public housing aims to meet the basic need of the general public with monthly household income less than or equal to S$14,000. For families with monthly household income more than S$14,000, they need to turn to the private residential market.
The Task
Assuming the role of a graphical editor of a median company, a minimum two and maximum three of data visualisations are prepared to reveal the private residential market and sub-markets of Singapore for the 1st quarter of 2024.
The Data

To accomplish the task, transaction data of REALIS will be used.
Downloading the Dataset
Access Dataset via SMU e-library

After logging in with SMU credentials, navigate to “Residential” tab

Under Property Types, “Select All”
Under Sale Date, select “2023 Jan” - “2024 Mar”
Click “Search”
Click “Download”
Due to the size of the dataset, it is split into multiple segments. Download all in .csv format

The Designing Tool
The data will be processed using the appropriate tidyverse family of packages and the statistical graphics will be prepared using ggplot2 and its extensions.
Getting Started
Installing and loading the required libraries
Note: Ensure that the pacman package has already been installed.
The code chunk below uses p_load() of pacman package to check if the listed packages are installed in the computer. If they are, then they will be launched into R. Otherwise, tidyverse will be installed and launched into R.
tidyverse: (i.e. readr, tidyr, dplyr, ggplot2, lubridate) for performing data science tasks such as importing, tidying, and wrangling data, as well as creating graphics based on The Grammar of Graphics
reshape2 for transforming data between wide and long formats
ggridges for creating ridgeline plots
ggdist for visualising distributions and uncertainty
ggrepel: provides geoms for ggplot2 to repel overlapping text labels.
ggthemes: provides some extra themes, geoms, and scales for ‘ggplot2’.
hrbrthemes: provides typography-centric themes and theme components for ggplot2.
patchwork: preparing composite figure created using ggplot2
Importing the Data
The data has been split into 5 .csv files
Define the path to the directory containing the CSV files.
Use
list.files()to list all CSV files in the specified directory.Loop through each CSV file, read it into a data frame using
read_csv(), and store it in a list.Use
bind_rows()to combine all data frames in the list into a single big data frame.
csv_directory <- "data/"
csv_files <- list.files(csv_directory, pattern = "\\.csv$", full.names = TRUE)
realis <- list()
for (file in csv_files) {
realis[[file]] <- read_csv(file)
}realis_all <- bind_rows(realis)View Data
- Use the
names()function to print the names of the columns in the tibble data frame. - Use the
glimpse()function to get a quick overview of the tibble data frame
col_names <- names(realis_all)
col_names [1] "Project Name" "Transacted Price ($)"
[3] "Area (SQFT)" "Unit Price ($ PSF)"
[5] "Sale Date" "Address"
[7] "Type of Sale" "Type of Area"
[9] "Area (SQM)" "Unit Price ($ PSM)"
[11] "Nett Price($)" "Property Type"
[13] "Number of Units" "Tenure"
[15] "Completion Date" "Purchaser Address Indicator"
[17] "Postal Code" "Postal District"
[19] "Postal Sector" "Planning Region"
[21] "Planning Area"
glimpse(realis_all)Rows: 26,806
Columns: 21
$ `Project Name` <chr> "THE REEF AT KING'S DOCK", "URBAN TREASU…
$ `Transacted Price ($)` <dbl> 2317000, 1823500, 1421112, 1258112, 1280…
$ `Area (SQFT)` <dbl> 882.65, 882.65, 1076.40, 1033.34, 871.88…
$ `Unit Price ($ PSF)` <dbl> 2625, 2066, 1320, 1218, 1468, 1767, 1095…
$ `Sale Date` <chr> "01 Jan 2023", "02 Jan 2023", "02 Jan 20…
$ Address <chr> "12 HARBOURFRONT AVENUE #05-32", "205 JA…
$ `Type of Sale` <chr> "New Sale", "New Sale", "New Sale", "New…
$ `Type of Area` <chr> "Strata", "Strata", "Strata", "Strata", …
$ `Area (SQM)` <dbl> 82.0, 82.0, 100.0, 96.0, 81.0, 308.7, 42…
$ `Unit Price ($ PSM)` <dbl> 28256, 22238, 14211, 13105, 15802, 19015…
$ `Nett Price($)` <chr> "-", "-", "-", "-", "-", "-", "-", "-", …
$ `Property Type` <chr> "Condominium", "Condominium", "Executive…
$ `Number of Units` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ Tenure <chr> "99 yrs from 12/01/2021", "Freehold", "9…
$ `Completion Date` <chr> "Uncompleted", "Uncompleted", "Uncomplet…
$ `Purchaser Address Indicator` <chr> "HDB", "Private", "HDB", "HDB", "HDB", "…
$ `Postal Code` <chr> "097996", "419535", "269343", "269294", …
$ `Postal District` <chr> "04", "14", "27", "27", "28", "19", "10"…
$ `Postal Sector` <chr> "09", "41", "26", "26", "79", "54", "27"…
$ `Planning Region` <chr> "Central Region", "East Region", "North …
$ `Planning Area` <chr> "Bukit Merah", "Bedok", "Yishun", "Yishu…
realis_all contains:
Public and Private residential property transaction data from 1st January 2023 to 31st March 2024.
There are 26806 rows and 21 columns.
Data Preparation
The task only requires data from the private residential market and sub-markets of Singapore for the 1st quarter of 2024.
Standardise Date Column Format
The “Sales Date” column is currently a cha type. It needs to be converted into date format.
dmy() is a function from the lubridate package that converts character strings to date format in the day-month-year (DMY) order.
Standardise Date Format and verify column type after standardisation
realis_all$`Sale Date` <- dmy(realis_all$`Sale Date`)
class(realis_all$`Sale Date`)[1] "Date"
Check data
head(realis_all$`Sale Date`)[1] "2023-01-01" "2023-01-02" "2023-01-02" "2023-01-02" "2023-01-03"
[6] "2023-01-03"
Keep only relevant rows
Filter and keep only rows that:
Sales Date that occur within Q1 2024 i.e. between 01 Jan 2024 to 31 Mar 2024 inclusive.
Purchaser Address Indicator not equal to “HDB”
In addition, any duplicate or empty rows are also removed.
q1_pte_raw <- realis_all %>%
filter(`Sale Date` >= as.Date("2024-01-01") &
`Sale Date` <= as.Date("2024-03-31"),
`Purchaser Address Indicator` != "HDB") %>%
distinct() %>%
drop_na()View Data
glimpse(q1_pte_raw)Rows: 3,567
Columns: 21
$ `Project Name` <chr> "THE LANDMARK", "POLLEN COLLECTION", "TE…
$ `Transacted Price ($)` <dbl> 2726888, 3850000, 2190000, 1954000, 3412…
$ `Area (SQFT)` <dbl> 1076.40, 1808.35, 807.30, 796.54, 1323.9…
$ `Unit Price ($ PSF)` <dbl> 2533, 2129, 2713, 2453, 2577, 838, 2007,…
$ `Sale Date` <date> 2024-01-01, 2024-01-01, 2024-01-01, 202…
$ Address <chr> "173 CHIN SWEE ROAD #22-11", "34 POLLEN …
$ `Type of Sale` <chr> "New Sale", "New Sale", "New Sale", "New…
$ `Type of Area` <chr> "Strata", "Land", "Strata", "Strata", "S…
$ `Area (SQM)` <dbl> 100.0, 168.0, 75.0, 74.0, 123.0, 328.0, …
$ `Unit Price ($ PSM)` <dbl> 27269, 22917, 29200, 26405, 27741, 9024,…
$ `Nett Price($)` <chr> "-", "-", "-", "-", "-", "-", "-", "-", …
$ `Property Type` <chr> "Condominium", "Terrace House", "Apartme…
$ `Number of Units` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ Tenure <chr> "99 yrs from 28/08/2020", "99 yrs from 0…
$ `Completion Date` <chr> "Uncompleted", "Uncompleted", "Uncomplet…
$ `Purchaser Address Indicator` <chr> "Private", "N.A", "N.A", "Private", "Pri…
$ `Postal Code` <chr> "169878", "807233", "118992", "598444", …
$ `Postal District` <chr> "03", "28", "05", "21", "21", "28", "20"…
$ `Postal Sector` <chr> "16", "80", "11", "59", "58", "79", "57"…
$ `Planning Region` <chr> "Central Region", "North East Region", "…
$ `Planning Area` <chr> "Outram", "Serangoon", "Queenstown", "Bu…
q1_pte_raw contains:
Private residential property transaction data from 1st January 2024 to 31st March 2024
There are 3567 rows and 21 columns.
Keep only relevant columns
Not all 21 columns will be used for analysis e.g. irrelevant, contains overlapping information as another column. Only relevant columns will be kept.
Columns to drop:
Type of Area: Not used in analysis
Area (SQM): Similar information as Area (SQFT)
Unit Price ($ PSM): Similar information as Unit Price ($ PSF)
Nett Price ($): Similar information as Transacted Price ($)
Purchaser Address Indicator: Not used in analysis
Postal District and Postal Sector: Overlapping information as Postal Code
The select() function is used to choose the columns to keep. However, by prefixing the column names with a minus sign (-), the function will drop the specified columns instead.
q1_pte <- q1_pte_raw %>%
select(
-`Type of Area`,
-`Area (SQM)`,
-`Unit Price ($ PSM)`,
-`Nett Price($)`,
-`Purchaser Address Indicator`,
-`Postal District`,
-`Postal Sector`
)View Data
glimpse(q1_pte)Rows: 3,567
Columns: 14
$ `Project Name` <chr> "THE LANDMARK", "POLLEN COLLECTION", "TERRA HIL…
$ `Transacted Price ($)` <dbl> 2726888, 3850000, 2190000, 1954000, 3412201, 29…
$ `Area (SQFT)` <dbl> 1076.40, 1808.35, 807.30, 796.54, 1323.97, 3530…
$ `Unit Price ($ PSF)` <dbl> 2533, 2129, 2713, 2453, 2577, 838, 2007, 1756, …
$ `Sale Date` <date> 2024-01-01, 2024-01-01, 2024-01-01, 2024-01-01…
$ Address <chr> "173 CHIN SWEE ROAD #22-11", "34 POLLEN PLACE",…
$ `Type of Sale` <chr> "New Sale", "New Sale", "New Sale", "New Sale",…
$ `Property Type` <chr> "Condominium", "Terrace House", "Apartment", "C…
$ `Number of Units` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ Tenure <chr> "99 yrs from 28/08/2020", "99 yrs from 09/12/20…
$ `Completion Date` <chr> "Uncompleted", "Uncompleted", "Uncompleted", "U…
$ `Postal Code` <chr> "169878", "807233", "118992", "598444", "589605…
$ `Planning Region` <chr> "Central Region", "North East Region", "Central…
$ `Planning Area` <chr> "Outram", "Serangoon", "Queenstown", "Bukit Tim…
q1_pte contains:
Private residential property transaction data from 1st January 2024 to 31st March 2024
There are 3567 rows and 14 columns.
Visualisation
There are various types of Properties for Private residences.
unique(q1_pte$`Property Type`)[1] "Condominium" "Terrace House" "Apartment"
[4] "Executive Condominium" "Semi-Detached House" "Detached House"
The different types in the dataset are:
Condominium
Terrace House
Apartment
Executive Condominium
Semi-Detached House
Detached House
Price Distribution
Create box plots for the price distribution of each private residence property type. geom_boxplot() displays continuous value list. It visualises five summary statistics (the median, two hinges and two whiskers), and all “outlying” points individually.
Note: For better visibility, labels parameter with the function scales::number, formats the y-axis labels to include thousands separators.

box_plot <- ggplot(data = q1_pte,
aes(x = `Property Type`, y = `Transacted Price ($)`)) +
geom_boxplot() +
labs(title = "Box Plot of Transacted Price ($) by Property Type",
x = "Property Type",
y = "Transacted Price ($)") +
scale_y_continuous(labels = scales::number) +
theme_gray() + theme(axis.text.x = element_text(angle = 45,
hjust = 1,
vjust=1))
box_plotThere are six private property types in the dataset.
The boxplot shows Detached Houses generally have the highest transacted prices. The Q1, median, and Q3 values for this property type are all substantially higher than those of other types, indicating its premium market status. Detached Houses also exhibit the highest variability in transacted prices, although there are relatively few outliers.
In contrast, both Executive Condominiums and Terrace Houses demonstrate the least variability in transacted prices, with their interquartile ranges (IQR) closely aligning with their medians. This suggests a stable and consistent pricing trend for these property types.
Executive Condominiums have the lowest transacted prices, indicating their appeal as an affordable option within the private housing market.
Apartments and Condominiums, on the other hand, show a significant number of outliers in their transacted prices, suggesting a wider range of pricing and potentially more diversity in market conditions for these property types.
Geographical Distribution
A choropleth map is used to visualize property sales across different districts or areas in Singapore.
Two data sets will be used to create the choropleth map. They are:
MPSZ-2019: This data provides the sub-zone boundary of URA Master Plan 2019. It can be downloaded at data.gov.sg It consists of the geographical boundary of Singapore at the planning subzone level. The data is based on URA Master Plan 2019.
Private residential property transaction data from 1st January 2024 to 31st March 2024 in tibble data frame (i.e.
q1_pte).
The code chunk below loads the following packages:
tmap: for thematic mapping
sf: for geospatial data handling
httr:
pacman::p_load(tmap,sf,httr, purrr, future, furrr)Geospatial Data
![]() |
![]() |
Import Geospatial Data
The code chunk below uses the st_read() function of sf package to import MPSZ-2019 shapefile into R as a simple feature data frame called mpsz.
mpsz <- st_read(dsn = "data/geospatial",
layer = "MPSZ-2019") %>%
st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`C:\lnealicia\ISSS608\Take-home_Ex\Take-home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
Add coordinates
q1_pte_coor <- q1_pte %>%
left_join(
found %>% select(results.POSTAL, results.LATITUDE, results.LONGITUDE),
by = c("Postal Code" = "results.POSTAL")
) %>%
rename(
Latitude = results.LATITUDE,
Longitude = results.LONGITUDE
) %>%
filter(!is.na(Longitude) & !is.na(Latitude))Convert to a simple features tibble data frame
q1_pte_sf <- st_as_sf(q1_pte_coor,
coords = c("Longitude", "Latitude"),
crs =4326) %>%
st_transform(crs = 3414)Keep relevant columns
q1_pte_sf_plot <- q1_pte_sf %>%
select(-`Project Name`,
-`Sale Date`,
-`Address`,
-`Type of Sale`,
-`Tenure`,
-`Completion Date`)Average Transacted Price by Planning Area
avg_txn_px <- q1_pte_sf_plot %>%
group_by(`Planning Area`) %>%
summarize(
Avg_Transacted_Price = mean(`Transacted Price ($)`, na.rm = TRUE)
)
avg_txn_px <- avg_txn_px %>%
mutate(`Planning Area` = toupper(`Planning Area`))
avg_txn_px <- st_drop_geometry(avg_txn_px)Combine avg_txn_px and mpsz
Populates the average transacted price of each planning area into mpsz sf data frame. Not every planning area may have transaction data, so empty rows are dropped.
mpsz_avg_txn_px <- mpsz %>%
left_join(
avg_txn_px,
by = c("PLN_AREA_N" = "Planning Area")
) %>%
drop_na()Plotting a point simple feature layer
The overall plot shows both the average transacted price of the planning area and the distribution of private residences. If viewing the data separately is preferred, click on the relevant tabs.
tmap_mode("view")
map <- tm_shape(mpsz_avg_txn_px) +
tm_polygons(col = "Avg_Transacted_Price",
palette = "Blues",
alpha = 0.4,
style = "quantile") +
tmap_options(check.and.fix = TRUE) +
tm_shape(q1_pte_sf_plot) +
tm_dots(col = "Property Type") +
tm_view(set.zoom.limits = c(11,14))
maptmap_mode("plot")tmap_mode("view")
map2 <- tm_shape(mpsz_avg_txn_px) +
tm_polygons(col = "Avg_Transacted_Price",
palette = "Blues",
alpha = 0.5,
style = "quantile") +
tmap_options(check.and.fix = TRUE) +
tm_view(set.zoom.limits = c(11,14))
map2tmap_mode("plot")tmap_mode("view")
map3 <- tm_shape(mpsz_avg_txn_px) +
tm_polygons() +
tm_shape(q1_pte_sf_plot) +
tm_dots(col = "Property Type") +
tm_view(set.zoom.limits = c(11,14))
map3tmap_mode("plot")tmap_mode(“view”):
Designed for interactive viewing of spatial data.
Enables zooming, panning, and other interactive features, making it easier to explore the data in detail.
tmap_mode(“plot”):
Used for static plotting of spatial data that can be saved as static images e.g., PNG
Does not support interactions like zooming and panning.
Setting the mode to plot after the map is generated saves on resources and allows the code to run faster, as the plot does not require continuous updating.
Using the overall plot,

